Goto

Collaborating Authors

 survival data





SurvDiff: A Diffusion Model for Generating Synthetic Data in Survival Analysis

Brockschmidt, Marie, Schröder, Maresa, Feuerriegel, Stefan

arXiv.org Artificial Intelligence

Survival analysis is a cornerstone of clinical research by modeling time-to-event outcomes such as metastasis, disease relapse, or patient death. Unlike standard tabular data, survival data often come with incomplete event information due to dropout, or loss to follow-up. This poses unique challenges for synthetic data generation, where it is crucial for clinical research to faithfully reproduce both the event-time distribution and the censoring mechanism. In this paper, we propose SurvDiff, an end-to-end diffusion model specifically designed for generating synthetic data in survival analysis. SurvDiff is tailored to capture the data-generating mechanism by jointly generating mixed-type covariates, event times, and right-censoring, guided by a survival-tailored loss function. The loss encodes the time-to-event structure and directly optimizes for downstream survival tasks, which ensures that SurvDiff (i) reproduces realistic event-time distributions and (ii) preserves the censoring mechanism. Across multiple datasets, we show that \survdiff consistently outperforms state-of-the-art generative baselines in both distributional fidelity and downstream evaluation metrics across multiple medical datasets. To the best of our knowledge, SurvDiff is the first diffusion model explicitly designed for generating synthetic survival data.


Enhanced Survival Trees

Zhou, Ruiwen, Xie, Ke, Liu, Lei, Xu, Zhichen, Ding, Jimin, Su, Xiaogang

arXiv.org Machine Learning

We introduce a new survival tree method for censored failure time data that incorporates three key advancements over traditional approaches. First, we develop a more computationally efficient splitting procedure that effectively mitigates the end-cut preference problem, and we propose an intersected validation strategy to reduce the variable selection bias inherent in greedy searches. Second, we present a novel framework for determining tree structures through fused regularization. In combination with conventional pruning, this approach enables the merging of non-adjacent terminal nodes, producing more parsimonious and interpretable models. Third, we address inference by constructing valid confidence intervals for median survival times within the subgroups identified by the final tree. To achieve this, we apply bootstrap-based bias correction to standard errors. The proposed method is assessed through extensive simulation studies and illustrated with data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study.


CoxNTF: A New Approach for Joint Clustering and Prediction in Survival Analysis

Fogel, Paul, Geissler, Christophe, Luta, George

arXiv.org Artificial Intelligence

The interpretation of the results of survival analysis often benefits from latent factor representations of baseline covariates. However, existing methods, such as Nonnegative Matrix Factorization (NMF), do not incorporate survival information, limiting their predictive power. We present CoxNTF, a novel approach that uses non-negative tensor factorization (NTF) to derive meaningful latent representations that are closely associated with survival outcomes. CoxNTF constructs a weighted covariate tensor in which survival probabilities derived from the Coxnet model are used to guide the tensorization process. Our results show that CoxNTF achieves survival prediction performance comparable to using Coxnet with the original covariates, while providing a structured and interpretable clustering framework. In addition, the new approach effectively handles feature redundancy, making it a powerful tool for joint clustering and prediction in survival analysis.


Counterfactual Survival Q Learning for Longitudinal Randomized Trials via Buckley James Boosting

Lee, Jeongjin, Kim, Jong-Min

arXiv.org Machine Learning

We propose a Buckley James (BJ) Boost Q learning framework for estimating optimal dynamic treatment regimes under right censored survival data, tailored for longitudinal randomized clinical trial settings. The method integrates accelerated failure time models with iterative boosting techniques, including componentwise least squares and regression trees, within a counterfactual Q learning framework. By directly modeling conditional survival time, BJ Boost Q learning avoids the restrictive proportional hazards assumption and enables unbiased estimation of stage specific Q functions. Grounded in potential outcomes, this framework ensures identifiability of the optimal treatment regime under standard causal assumptions. Compared to Cox based Q learning, which relies on hazard modeling and may suffer from bias under misspecification, our approach provides robust and flexible estimation. Simulation studies and analysis of the ACTG175 HIV trial demonstrate that BJ Boost Q learning yields higher accuracy in treatment decision making, especially in multistage settings where bias can accumulate.



Predicting the Lifespan of Industrial Printheads with Survival Analysis

Parii, Dan, Janssen, Evelyne, Tang, Guangzhi, Kouzinopoulos, Charalampos, Pietrasik, Marcin

arXiv.org Artificial Intelligence

Personal use of this material is permitted. This paper has been published in the 8th IEEE Conference on Industrial Cyber-Physical Systems (ICPS) in Emden, Germany, May 12-15, 2025. Abstract --Accurately predicting the lifespan of critical device components is essential for maintenance planning and production optimization, making it a topic of significant interest in both academia and industry. In this work, we investigate the use of survival analysis for predicting the lifespan of production printheads developed by Canon Production Printing. Specifically, we focus on the application of five techniques to estimate survival probabilities and failure rates: the Kaplan-Meier estimator, Cox proportional hazard model, Weibull accelerated failure time model, random survival forest, and gradient boosting. The resulting estimates are further refined using isotonic regression and subsequently aggregated to determine the expected number of failures. The predictions are then validated against real-world ground truth data across multiple time windows to assess model reliability. Our quantitative evaluation using three performance metrics demonstrates that survival analysis outperforms industry-standard baseline methods for printhead lifespan prediction.


Deep Learning-Based Survival Analysis with Copula-Based Activation Functions for Multivariate Response Prediction

Kim, Jong-Min, Ha, Il Do, Kim, Sangjin

arXiv.org Machine Learning

This research integrates deep learning, copula functions, and survival analysis to effectively handle highly correlated and right-censored multivariate survival data. It introduces copula-based activation functions (Clayton, Gumbel, and their combinations) to model the nonlinear dependencies inherent in such data. Through simulation studies and analysis of real breast cancer data, our proposed CNN-LSTM with copula-based activation functions for multivariate multi-types of survival responses enhances prediction accuracy by explicitly addressing right-censored data and capturing complex patterns. The model's performance is evaluated using Shewhart control charts, focusing on the average run length (ARL).